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SUMMARY 


The  paper  examines  some  properties  of  inference  about  the  point  in 
a  sequence  of  random  variables  at  which  a  distribution  change  occurs. 
Three  particular  aspects  are  considered:  asymptotic  theory,  approxima¬ 
tions  to  test  power,  and  use  of  discriminant  functions  other  than  the 
likelihood  ratio.  The  discussion  is  illustrated  by  some  examples. 


1.  INTRODUCTION 


In  three  recent  papers,  Hinkley  (1970,  1971)  and  Hinkley  and  Hinkley 
(1970)  have  discussed  likelihood  inference  for  the  unknown  change-point 
parameter  ?  in  the  model 

f F (x,  6)  i=l,  ...  ,  5 

(X.)  -  < 

[G(x,  i>)  i=C+l,  ...  ,  T 

for  the  sequence  (X^,  ...  ,  X^)  of  independent  random  variables.  Parti¬ 
cular  emphasis  was  placed  on  the  normal  mean-shift  case  and  the  binomial 
proportion-shift  case,  for  which  detailed  numerical  results  were  given. 
However,  some  simple  general  results  were  overlooked.  In  the  present 
paper  we  attempt  to  remedy  that  situation. 

Section  2  is  concerned  primarily  with  likelihood  ratio  tests  of 
significance  about  £  ,  and  a  brief  summary  of  previous  work  is  given 
in  Section  2.1.  Then  we  derive  simple  approximations  to  the  power  func¬ 
tions  of  the  tests  using  approximations  for  the  tails  of  the  null  distri¬ 
butions.  For  this  discussion  we  assume  9  and  to  be  known,  with  £ 
and  T-£  assumed  large.  Asymptotically  in  E,  and  T-£  the  same  results 
apply  with  0  and  \p  unknown,  as  we  prove  in  Section  2.3  using  standard 
asymptotic  results  for  likelihood  estimation.  Some  examples  of  the  power 
approximations  are  given  in  Section  2.4. 

We  may  regard  the  change-point  model  as  defining  a  classification 
problem,  .in  which  we  have  to  classify  all  observations  X^,  ...  ,  X^  . 

We  have  a  strong  ordering  principle  through  which  £  identifies  the 
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correct  classification.  This  point  of  view  suggests  consideration  of 
discriminant  functions  other  than  the  likelihood  ratio,  since  the  latter 
may  be  inconvenient  for  certain  situations.  One  familiar  discriminant 
function  is  the  cumulative  sum  chart  developed  by  Page  (1954).  In 
Section  3  we  generalize  the  likelihood  results  of  Section  2  to  general 
discriminant  functions,  with  some  examples  in  Section  3.4. 

Some  brief  remarks  on  relevant  aspects  of  the  problem  are  made  in 
Section  4. 
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2.  LIKELIHOOD  INFERENCE  ABOUT  £ 

The  general  parametric  single-change  model  for  a  sequence  of  inde¬ 
pendent  vector  random  variables  (X^,  ...  ,  X^)  may  be  written  as 

[f(x,  0)  j=l,  ...  ,£ 

(2.1)  <£(X  )  = 

2  G(x,  i|0  j=?+l,  ...  ,  T  , 

where  £  is  unknown,  F  and  G  are  known  distribution  functions  with 
possibly  unknown  vector  parameters  0  and  ip  respectively,  and  F(x,  0) 
?  G(x,  ip)  .  We  should  note  that  (2.1)  may  be  an  alternative  expression 
for  the  model 


(2.2) 


X.  (xju)  = 


f F(x,  0) 
G(x,  \p) 


u  £  U 
u  >  n  , 


\ 


V 


1 

\ 

( 

/ 


where  observations  on  X  are  taken  at  values  u,,  . .  .  ,  u,_  <  U  <  u,. , , 

1  s  —  t,+l 

<  ...  <  u^  of  an  independent  variable  u  .  Estimation  of  £  or  r| 
implies  classification  of  the  observations,  which  is  one  possible  objec¬ 
tive  of  statistical  analysis.  Another  objective  is  a  test  of  significance 
on  £  or  n  ,  or  a  confidence  interval  for  £  or  r|  .  When  r|  is  the 
parameter  of  interest  in  this  latter  case,  our  discussion  of  the  model 
(2.1)  is  directly  relevant  to  spacing  of  the  u^  to  meet  prior  require¬ 
ments  on  power  of  inference. 

In  this  section  we  consider  properties  of  likelihood  inference  about 
£  .  The  discussion  incorporates  a  summary  of  previous  work  (Hinkley, 

1970)  together' with  new  results  on  large-sample  theory  and  approximations 
to  distributions  of  likelihood  ratio -statistics. 
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2.1  Likelihood  Inference  for  known  8  and  tj) 


Suppose  for  the  moment  that  F  and  G  have  discrete  or  continuous 

densities  f  and  g  respectively,  and  that  0  and  are  known.  Then 

if  £  is  the  true  value  of  £,  ,  the  log  likelihood  L(£)  of  £  under 
o 


the  model  (2.1)  satisfies 


(2.3).  L(£)  -  L(5q)  =  < 


5-5. 


j-1 


z. 


J-1 


(5o  <  5  <  T) 


(l  <  5  <  5  )  , 

—  O 


where 


(2.4) 


Y.  =  log{f(X?  +j,  0)/g(X5  +j,  l|>)} 

z.  =  iog{g(xc  +1_.,  +1_.,  0)}  . 


Thus  L(£)  generates  two  independent  random  walks  each  with  iid  incre¬ 
ments.  It  is  apparent  from  (2.4)  that  E(Y_.)  <  0  and  E(Z^)  <0  if 
these  expectations  exist.  In  what  follows  it  is  convenient  to  assume 
their  existence,  and  so  we  shall  assume  f  and  g  to  have  identical 
support;  that  is,  the  set  {x:  f(x,  0)  =  0}  has  zero  probability  under 
g(x,  ip)  and  vice  versa.  An  example  where  this  is  not  the  case  will  be 
described  in  Section  2.4. 

The  likelihood  ratio  statistics  for  testing  Hi  £  =  5  versus  the 
one-sided  alternatives  H  :  5  <  5  and  H  :  £  >  5  are  respectively 

(2.5)  S_  =  max  L(£)  -  L(£*)  ,  S+  =  max  L(£)  -  L(£*)  . 

5<5*  5>5* 
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For  the  two-sided  alternative  the  test  statistic  is  S  =  max(S  ,  S+) 

* 

Therefore  when  H  is  true,  that  is  E,  =  ,  we  can  write 


n 


S  =  tJ  =  max 


(2.6) 


n>l  j=l 
n 


S+  =  V  =  max  >  Y.  , 
n>l  J=T  3 


and  S  =  max(U,  V)  . 


The  maximum  likelihood  estimate  E,  will  satisfy 


(2.7) 


?o  "n 


(U  = 


Z.  >  0,  u  >  v) 


i- 


j-1  J 

(U  <  0,  V  <  0) 


n 


V 


?o  +n 


(V  = 


j=l 


Y.  >  0,  V  >  U)  , 


with  necessary  modification  if  multiple  maxima  of  L(£)  can  occur. 

A 

The  distribution  of  E,  is  awkward  to  calculate,  and  the  inefficiency 
of  E,  as  a  test  statistic  (Hinkley,  1971;  Hinkley  and  Hinkley,  1970) 
has  encouraged  us  to  pay  little  attention  to  it  in  this  paper. 

We  shall  assume  that  E,  and  T-£  are  infinitely  large,  so  that 
U  and  V  are  maxima  of  interminate  random  walks.  The  distributions 
of  U  and  V  converge  exponentially  in  E,  and  T-£  .  In  terms  of 
the  model  (2.2),  we  are  assuming  the  u_.  to  become  increasingly  dense 
on  either  side  of  u  -  r|  for  u  in  some  bounded  interval. 

We  shall  denote  the  distribution  functions  of  Y  and  Z  by  H^(y) 
and  H^(z)  ,  while  the  distribution  functions  of  U  and  V  will  be 
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denoted  by  P^u)  and  Py(v)  .  Then  since  E(Y)  <  0  and  E(Z)  <  0 
both  U  and  V  are  finite  with  probability  one  and  their  distribu¬ 
tion  functions  satisfy 


(2.8) 


Vv> 

V") 


Py(y)  dH^v-y) 


PyCy)  dHz(u-y)  . 


We  also  have  that 


(2.9) 


Py(0)  =  exp  <- 


00 

1= 

rt'1  pr  ( 

'£ 

Y1  > 

n=l 

\ 

L_l. 

II 

l-» 

J 

00 

/  n 

2Z 

-1 

n  pr 

( n 

n=l 

V  J-i 

A  result  central  to  our  discussion  is  the  following.  If  there 
exists  a  positive  (dY  such  that  E{exp(u>Y  Y) }  =  1  ,  then 

(2.10)  Py(v)  ~  1  ~  CY  exp(-wY  v)  (v  ->  °°) 


for  some  constant  cY  ;  see  Feller  (1966,  p.  392).  Therefore  if  we 

"f*  "f* 

denote  the  (asymptotic)  test  size  for  S  by  8q  (v)  ,  then 

(2.11)  8*  (v)  =  1  -  Py(v)  ~  cy  exp(-v)  , 

because  0>Y  =  1  when  Y  is  the  log  likelihood  ratio  (2.4).  We  should 

note  that  S+  is  stochastically  increasing  in  T-£q  ,  so  that  the  test 

*4“ 

size  increases  monotonically  to  8Q  (v)  as  T-CQ  tends  to  infinity. 
The  corresponding  result  for  asymptotic  test  size  of  S  is 

8q  (u)  =  1  -  Py(u)  ~  cz  exp(-u)  . 
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In  general  the  constant  Cy  in  (2.10)  cannot  be  evaluated  analyti¬ 
cally,  but  a  method  of  numerical  computation  due  to  W.  M.  Gentleman  may 
be  summarized  as  follows.  Consider  (2.8)  in  the.  form 

A  00 

(2.12)  1-Py(v)  =  l-Hy(v)  +  f  {l-Pv(x)}dHY(v-x)  +  cy  J  e“XdHy(v-x), 

"0  A 

which  for  large  A  will  be  a  good  approximation.  Now  for  some  finite 

set  of  numbers  x„,  ...  ,  x  such  that  0  =  xft  <  x,  <  . . .  <  x  =  A  , 
On-  01  n 

replace  the  first  integral  in  (2.12)  by  the  approximating  trapezoidal 
sum  over  the  interval  (Xq,  x^) ,  ...  ,  (x  x^)  .  Then  given  Cy 
and  A  ,  (2.12)  induces  a  set  of  linear  equations  for  l-P^(x^) 

(i=0,  1,  ...  ,  n)  .  Suppose  we  solve  these  equations  for  trial  values 
of  cY  and  A  ,  and  denote  the  solutions  by  a^,  a^,  . . .  ,  aR  .  If 
a^  exp(x_j)  is  approximately  constant  for  x^  near  A  ,  then  A  is 
suitably  large.  The  trial  value  of  Cy  is  approximately  correct  if 
an  exp (A)  is  equal  to  the  trial  value.  When  a^  exp (A)  is  not  close 
to  the  trial  value  of  Cy  ,  a  second  estimate  of  Cy  is  made  close  to 
an  exp (A)  ,  and  the  "solution-inspection"  process  is  repeated.  For  a 
numerical  example  of  this  procedure  the  reader  is  referred  to  Hinkley 
(1970). 

We  should  emphasize  that  (2.11)  will  provide  a  good  approximation 
only  for  the  upper  tail  probabilities  of  s"*"  .  So  for  small  test  size 
a  ,  the  critical  value  of  the  statistic  S+  is  approximately  log(Cy/a) 
Some  numerical  values  of  Cy  for  the  normal  mean-shift  case  are  given 
in  Section  2.4. 
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The  classical  inversion  of  significance  tests  to  construct  confi¬ 
dence  intervals  is  non-trivial  for  E,  ,  since  the  log  likelihood  L(£) 

A; 

does  not  decrease  monotonically  away  from  E,  =  E,  .  Hence  if  a  lower 
confidence  limit  E,^  is  defined  in  the  usual  way  to  be 

=  inf{£:  max  L(t)  -  L(£)  <  v}  , 

with  nominal  confidence  coefficient  P^(v)  ,  then  pr(£^  £  £Q)  >  Py(v)  • 
In  fact  we  find  that 


pr(C£  >  ZQ)  -  pr (V  >  v  ,  U  <  V-v) 


Pu(x-v)dPv(x) 


For  large  v  ,  (2.10)  implies  that 


pr(^  >  50>  a  3^(v)  jT  PD(x)  e'Xdx 


=  3+(v){Pn(0)  + 
o  u 


e  ^PyCx)} 


where 


00 

=  £(v)  P^O)  {1  +  exp  XZ  dn/n)}  , 

n=l 


d 

n 


pr  (x  _<  >  Y  <  x+dx)  . 

3=1  J 


This  last  expression  follows  from  (2.13)  of  Hinkley  (1970)  and  can  also 
be  deduced  from  (2.18).  The  point  is  that  the  lower— bounded  confidence 
set  =  ^ :  n!ax  Ml)  “  L(C)  £  v)  does  not  contain  all  E,  greater 
than  E,^  •  Incidentally,  we  should  observe  that 
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=  inf {£:  L(C)  -  L(£)  <  v  and  g  <  O  . 

For  large  confidence  coefficient  1-a  ,  v  log  (cy/a)  . 

Similar  remarks  apply  to  confidence  intervals  derived  from  the 
test,  statistics  S  and  S  . 

i 

Finally  we  note  that  the  asymptotic  probability  of  no  misclassi- 
fication  is 


(2.14) 


pr(£  =  5q)  =  Pu(0)  Pv(0)  , 


which  can  be  calculated  from  (2.9).  One  important  characteristic  of 

/S  A 

C  is  that  £  -  =  0p(l)  • 


2.2  Power  of  likelihood  inference  for  known  6  and  ijj 


In  the  previous  section  we  examined  the  null  distributions  of  the 
likelihood  ratio  statistics  S  and  S  .  Now  suppose  that  £  =  £  -  r 

with  r  >  0  and  consider  S+  .  By  (2.3)  and  (2.5)  we  see  that 


(2.15)  S  =  max(-Z  -Z  -Z  ,  ...  ,  -  2_  Z. ,  -  Z_  z*  +  v)  • 

J-l  3  .  j-1  J 

Looking  first  at  the  case  r  =  1  ,  (2.15)  implies  that 

^  *  c 00 

(2.16)  3^(v)  =  pr(S+  >  v|?  =  ?Q-1)  =  1  -  /  Pv(v+z)dHz(z)  . 

J-v 

But  if  v  is  large,  or  if  E(Z)  and  var(Z)  are  small,  then  H„(z) 

u 

will  concentrate  its  mass  well  above  z  =  -v  and  (2.16)  will  be 


.  ~  1  -  f  {1  -  cy  exp(-v  -z)}dHz(z)  . 

—CO  “ 
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Therefore 

(2.17)  3^i (v)  ~  3+(v)  E{exp(-Z)}  . 

as  v  <»  or  |  1 F— G  |  |  -*■  0  .  For  general  positive  r  a  similar  argu¬ 
ment  applies  to  the  generalization  of  (2.16)  and  gives 

(2.18)  3^r(v)  ~  8+(v)  lE{exp(-Z)}]r  . 

This  expression  is  more  familiar  in  the  special  case  g(x,  *|))  =  f  (x,  0+6) 
with  6  •*  0  ,  when  formal  expansion  of  E{exp(-Z)}  and  substitution 
in  (2.18)  leads  to 

(2.19)  3*r(v)  ~  3+(v)  fl  +  r6 '  If(0)6}  +  o( | | 6 | | 2)  . 

Here  1^(0)  is  the  Fisher  information  matrix  for  f(x,  0)  . 

The  result  for  S  corresponding  to  (2.18)  is 

8r(u)  =  pr(S  >  u|C  =  £0+r)  ~  3q(u)  [E{exp(-Y)}Jr 

as  u  -*■  00  and  |  |  F-G  |  |  -*■  0  with  r  >  0  . 

,  mm  ' 

It  remains  to  consider  S  when  £  >  £  and  S  when  £  <  £ 

o  o 

But  it  is  immediate  from  (2.3)  and  (2.5)  that 

S+  =  V  (?*  >  5o)  and  S~  =  U  (?*  <  £q)  , 

from  which  it  follows  that  the  one-sided  likelihood  ratio  tests  are 
unbiased.  We  shall  see  in  Section  3  that  they  are  not  uniformly  most 
powerful. 
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Approximations  such  as  (2.18)  and  (2.19)  may  be  useful  in  the 

model  (2.2).  For  suppose  the  spacing  of  the  independent  variable  u 

* 

is  constant,  that  is  u^^  -  u^  =  £  .  If  we  wish  to  test  H:  n  ■  n 
against  H':  n  >  I)  with  (small)  test  size  a  and  fixed  power  it  at 
n  =  ri  +  Y  ,  then  for  example  (2.19)  indicates  that  we  should  have 

y{6’  I  (6)6}a 

e  ^ - h -  . 

it  -  a 

Some  numerical  examples  oif  (2.18)  are  given  in  Section  2.4.  We 
shall  make  use  of  these  power  approximations  to  describe  the  efficiency 
of  alternative  test  statistics  in  Section  3. 

2.3  Convergence  of  likelihood  statistics  for  unknown  9  arid  ijj 

In  the  discussion  thus  far  we  assumed  0  and  ip  to  be.  known,  in 
order  to  take  advantage  of  the  random  walk  representation  for  L(£)  . 

When  0  and  ip  are  unknown,  it  is  easy  to  verify  that  the  asymptotic 
distributions  of  the  likelihood  statistics  are  unchanged.  To  demonstrate 
this  we  shall  use  familiar  results  on  consistency  of  maximum  likelihood 
estimates. 

For  simplicity  let  9  and  ip  be  one-dimensional.  Then  we  may 
summarize  our  assumptions  on  F  and  G  as  follows. 

Assumption  2.1.  The  family  {F(x,  0),  0  £0  ;  G(x,  ip) ,  ij;  e  ¥}  satisfies 
the  consistency  assumptions  given  by  Wald  (1949),  with  the  understanding 
that  continuity  conditions  apply  to  the  sub-families  {F(x,  0),  0C0} 
and  {G(x,  £  ¥ }  if  these  have  no  intersection.  Included  in  this 

assumption  is  the  fact  that  F  and  G  are  either  both  discrete  or 

both  absolutely  continuous. 
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Assumption  1  could  be  relaxed  to  include  the  more  general  cases 
discussed  byKiefer  and  Wolfowitz  (1956),  for  example.  The  proofs  of 
results  in  this  section  can  be  generalized,  just  as  Wald’s  consistency 
theorem  was  generalized. 

With  8  and  1 p  unknown,  let  0^  and  ip^  be  the  maximum  likeli¬ 
hood  estimates  of  0  and  ip  conditional  on  E,  .  Then  the  marginal 
log  likelihood  of  £  is 

E  T 

(2.20)  UE)  =  21  loS  f&.,  6r>  +  ZI  lQg  g<x->  V  • 
j=i  2  *  j=e+i  2  * 

To  distinguish  the  maximum  likelihood  estimates  of  E  when  6  and  $ 

A 

are  known  and  unknown,  they  will  be  denoted  by  E,  and  £  respectively. 

* 

The  likelihood  ratio  statistic  for  testing  H:  E,  -  E,  against  the  two- 

* 

sided  alternative  H  :  E  f  E  is 

a 

S  =  max  L(£)  -  L(£  ) 

EtE* 


when  0  and  ip  are  unknown,  corresponding  to  S  when  0  and  ip  are 
known.  We  are  also  interested  in  0  and  i p  ,  whose  maximum  likelihood 

A  _  A  ___ 

estimators  are  0=0^-  and  ip  =  ip^  .  The  true  values  of  0  and  ip 

will  be  denoted  by  0  and  ip 

o  Yo 

/\  A 

Once  0  and  ip  are  determined,  the  log  likelihood  may  be  modified 


to 


E  /N.  T 

UE)  =  log  f(x  e)  +  2Z  log  g(x,,  ip)  , 
j=i  2  j=C+i  2 


and  corresponding  statistics  E,  and  S  calculated.  We  shall  use  L(£) 

A  A 

to  establish  the  asymptotic  properties  of  £,  S,  0  and  tJj  . 
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Without  loss  of  generality  we  suppose  that  £q  =  XT  for  some  fixed 
X(0<X<1)  ,  so  that  asymptotic  results  are  for  T  .  The  random  walk 

A 

structure  of  L(£)  implies  £  -  £  =  0  (1)  and  S  =  0  (1)  if 

■/.  A  _  _ 

£  -  £q  =  0(1)  ,  so  the  asymptotic  distributions  of  (£,  S)  ,  (£,  S) 

and  (£,  S)  are  the  same  if  we  prove  the  following  theorem: 


Theorem  2.1. 
0  (1)  for 

p  - 

(iv)  S  -  S 


Under  Assumption  1, 

£  -  £  =  0(1)  as  T 

o  — 

=  o(l)  for  £  -  £ 

p  -  o 


(i) 


__  A 

£  -  £  =  o  (1)  and  (ii)  S  -  S  = 
P  - 

.  Also  (iii)  £  -  £  =  o  (1)  and 
-  P  ‘ — ~ 


=  0(1)  as  T  -*•  00  . 


To  prove  the  theorem,  we  show  first  that  £  -  £  =0  (1)  .  By  the 

op 

definition  of  L(£)  in  (2.20),  we  have  that  for  r  >  0 

5o+r 

L(S  +r)  <  L(£  )  +  sup  >  log{f(X  ,6)/g(X.,lf»  )> 

°  °  0e0  i=£  +1  1  10 

o 

T  __  T 

+  XI  log{g(X  )/g(X  ip  )}  +  XI  log{g(X  *  .  )/g(X.,l|0} 

i=£  +1  1  °  1  *o  i=£  +r+l  1  5o+r  1  ° 

o  o 


—  L(£  )  +  A  +  B„  +  C 

o  r  T  r,T 


> 


say.  Now  B  <  0  ,  C  is  finite  with  probability  one,  and  A  ->  -oo 
x  it  9  x  r 

with  probability  one  by  Assumption  1.  It  follows  that  if  r  (T)  oo 

o 

as  T  °o  ,  then 

lim  pir{L(£o+r)  <  L(£q)  ,  r  =  rQ(T),  ...  ,  T-£q}  =  1  . 
Similarly  we  find  that  if  r^(T)  -*'<»  as  T  •+  oo  then 

lim  pr{L(£  -r)  <  L(£  )  ,  r  =  r  (T),  ...  ,  £  -1}  =  1  . 

T-Ko  O  O  X  O 
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But  since  r  (T)  and  r,  (T)  are  arbitrary  and  !•(£)’>  L(£  )  ,  we 
o  i  "  o 

deduce  that 


(2.21) 


e  -  €  -  o  (i) 

o  p 


Next  we  need  to  prove  that  0  and  ip  are  consistent.  For  0 
this  means  that  for  any  closed  set  iod0  -  0q  , 

X 


sup  YZ  log(f(X  0)/f  (X  ,  0  )}  =  log  0  (1) 
0€W  j-1  3  3  °  P 


(2.22) 

By  definition,  (2.22)  is  less  than  or  equal  to 


(2.23)  sup  log{f(X.,0)/f(X.,9  )}  +  sup 


0€W  j-1 

b  A 


j 


3  °  0£W  j=£  +1 


l°g{f (X. ,0) /f (X  ,0  ) > 


3 


y o' 


b  a 

where  _  means  2> _  if  b  >  a  ,  and  _  if  b  <  a  .  The 

j=a+l  j=a+l  j  =b+l 

first  term  in  (2.23)  is  log  0p(l)  by  Assumption  1,  and  (2.21)  implies 

that  the  second  term  is  0^(1)  .  This  proves  (2.22)  and  with  the  cor¬ 
responding  argument  for  ^  we  get 


(2.24)  0  -  9  -  o  (1)  and  ij>  -  ij>  -  o  (1)  . 

op  o  p 

We  note  without  proof  that 

(2.25)  -  0q  =  op(l)  and  ^  -  ipQ  =  o  (1)  (C  -  -  0(1))  . 

We  can  now  prove  (i)  and  (ii)  of  Theorem  2.1.  The  definitions  of 

—  iv 

L(£)  ,  L(£)  and  L(£)  imply  that  for  any  £ 


(2.26)  L(£)  L(0  <  L(C)  -  L(£)  <  L(£)  -  L(£) 


+  21  *  iog{f(x  0)g(x  i)/f(x,,e  )g(x,i)} 

j-5+i  3  3  0  3  °  3 
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and 


(2.27)  L(?)  -  L(?)  >  L(?)  -  L(£)  >  ~  log{f  (Xj  ,0?) /g^  ,ip?)  }-  L(£) 


-  L(0  -'L(0  +  2_  log{f(X  .e^gCX  ^  )/f(X  ,0  )g(X  ,^)} 
j  =^+l  3s.  3°  3°  3s 


But  if  K  ~  K  -  ?o  +  0(1)  ,  the  sums  in  (2.26)  and  (2.27)  contain  0^(1) 
terms,  each  of  which  is  0^(1)  by  (2.24)  and  (2.25).  It  follows  that 
for  any  £  >0  and  £  =  £  +  0(1)  , 


(2.28) 


lim  pr{ |l(£)  -  L(C*)  -  L(£)  +  L(?*)  i  <  e}  =  1  . 


A  similar  argument  shows  that  for  any  €  >  0  and  E,  =  +  0(1)  , 

(2.29)  lim  pr{L(C)  -  L(£*)  >  L (?)  -  L(£*)  -6  }  =  1  , 

T-*» 

which  together  with  (2.28)  gives 

lim  pr{L(C)  -  L(?*)  >  L(f)  -  L(£*)  -  2€>  =  1  . 

T-K» 

______  A  _  A 

But  since  L(£)  L(£)  ,  with  equality  if  and  only  if  E,  =  E,  ,  part  (i) 

of  Theorem  2.1  is  proved.  Also  (2.28)  proves  part  (ii). 

Parts  (iii)  and  (iv)  may  be  proved  by  applying  the  same  arguments 

/v» 

to  L(£)  and  E,  ,  once  it  is  shown  that  E,  -  E,q  =  0^(1)  .  We  omit  the 
details  here. 

A,  A 

Finally  we  have  the  following  result  for  0  and  ip  . 


Theorem  2.2.  If  Assumption  1  holds,  and  if  f  and  g  are  such  that 
the  limiting  distributions  of  (0^.  -  0  )  and  /T -E,  0p~  -  ip  ) 

_  xs  ^O _  i*s  ^o 

are  normal,  then  /?0  (0  -  9q)  and  /T -E,^  (ip  -  ip^)  also  have  those 
limiting  normal  distributions. 
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It  suffices  to  consider  0  .  From  the  definitions  of  0  and  0-  wg 

'’a 


get 

I  ' 

J~  *  log{f  (X. ,  0-  ) /f  (X  ,  0) }  <Y1  log{f  (X  ,  0)/f(X  ,  0  }  <  0  . 

j=£o+l  3  3  j=l  ■  3  o' 


Hence  by  (2.21),  (2.24)  and  (2.25)  we  see  that 


YZ  ,,  e)/f(x  ,  ©£.  )}  =  o  (i)  . 

j=l  3  3  ^o  P 


But  the  asymptotic  normality  of  0~  implies  only 

^o 


log{f (X  ,  0  )/f(X  ,  0Q)}  =  0(1)  , 

j=l  3  ^o  3  P 

so  that  0  -  0,_  =  o  (£  2)  .  This  proves  the  desired  result. 

£  p  o 
o 

Note  that  by  (2.21)  the  asymptotic  normality  in  Theorem  2.2  holds 


with  replaced  by  £  . 


2.4  Some  examples 

In  Sections  2.1  and  2.2  we  derived  some  simple  limiting  expressions 
for  the  power  and  size  of  likelihood  ratio  tests.  To  illustrate  these 
results  we  look  at  the  normal,  binomial  and  rectangular  distributions. 

When  F  and  G  are  the  multivariate  normal  distributions  MN(0,  E) 
and  MNOj;,  E)  respectively,  so  that  a  mean-shxft  takes  place,  we  have 


(2.30)  L(£) 


sample  constant  +  ^ _  (0-ty)  1  E  (X 

j=l  3 
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2  2 

The  random  walks  (2.3)  each  have  iid  increments  which  are  N(-d  /2,  d  ) 
2  —1 

with  d  =  '  E  (0— 4>)  .  This  particular  problem  is  discussed 

at  length  by  Hinkley  (1970,  1971).  The  approximation  (2.11)  for  test 
size  is  very  accurate  for  d  2  and  B+(v)  ^0.05  ,  the  maximum 
relative  error  being  0.06  at  d  =2.0  and  j3+(v)  =  0.05.  For  this 
reason  it  seems  useful  to  record  the  coefficients  c^  in  Table  2.1. 
Turning  to  the  power  function  approximation,  we  get 

E{exp(-Y)}  =  E{exp(-Z)}  =  exp(d^)  . 

Some  exact  values  of  (3+^  (v) /$^(v)  were  computed  from  (2.16)  and 

numerical  solution  of  (2.12).  These  are  given  in  Table  2.2  together 

2 

with  the  approximation  exp(d  )  .  Evidently  the  approximation  is 
excellent  for  d  1  and  Bq(v)  ,5.  0.05  ,  but  poor  for  d  as  large 
as  2  . 

Table  2.1.  Coefficients  c  in  (2.5)  for  the  normal 

Y  2 
case  with  mean-shift  distance  d 


0.6 

0.8 

1.0 

1.2 

1.4 

1.6 

1.8 

2.0 

0.70549 

0.62800 

0.56030 

0.49990 

0.46646 

0.39917 

0.35735 

0.32037 

Table  2.2. 

Exact  and  approximate  values 

of  3^ 

(v)/8q(v)  for 

the 

normal  case 

with  mean-shift 

distance 

d2 

3^(v) 

.05 

.02 

.01 

.005 

.001 

,  J 

exact 

2.72 

2.72 

2.72 

2.72 

2.72 

m 

approx. 

W 

2.72 

2.72 

2.72 

2.72 

2.72 

j 

[exact 

8.50 

9.20 

9.45 

9.49 

1  . 

japprox. 

9.49 

9.49 

9.49 

9.49 

i  j 

[exact 

45.50 

49.50 

'  \ 

japprox. 

54.60 

54.60 

As  a  second  example,  consider  the  binomial  case 

f(x,  8)  =  ^  8X  (1-  )n  X  ,  (x=0 . n  )  ,  g(x,  ip)  S  f  (x,  ifj)  . 

The  case  n  =  1  was  examined  in  detail  by  Hinkley  and  Hinkley  (1970) , 

who  derived  simple  recurrence  relations  for  the  exact  (asymptotic) 

-  +  ~ 

distributions  of  S  ,  S  ,  and  £  .  In  the  notation  of  Section  2.1  we 
have 


Yj  "  X  \+5  '  y(n  "  X50+j) 


Zj  =  'X  \-j+l 


+  Vi(n  -  _J+1) 


where  X  =  log (8/xp)  and  y  =  log{ (1-ip)/ (1-8)}  . 


Therefore 


(2.31) 


E{exp(-Y)}  =  fy2  e"1  +  (l-^)2  (l~8)_1}n 

E{exP(-z)}  =  {e2  iff1  +  u-e)2  d-^)”1}11  . 
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+  + 

Some  exact  values  of  3  ^(v)/8o(v)  for  the  case  n  =  1  are  given  in 
Table  2.3  with  the  corresponding  values  of  the  approximation  (2.17); 

.j. 

the  nominal  test  sizes  8o(v)  in  the  table  are  not  quite  achieved, 

because  S+  has  a  discrete  distribution.  The  approximation  (2.17) 

seems  to  be  very  good  when  its  value  is  less  than  1.6  ;  the  square  of 

(2.31)  is  equally  good  as  an  approximation  to  $_2 (v) /3q (v)  .  Tables 

of  percentage  points  and  power  of  the  two-sided  test  statistic  may  be 

found  in  Hinkley  and  Hinkley  (1970). 

Note  from  (2.18)  and  (2.31)  that  the  power  at  £  =  £  +r  is 

o 

approximately  determined  by  nr  ,  as  we  should  expect. 


Table  2.3. 


•ja 

Exact  and  approximate  values  of  8  ,  (v)/f3  (v) 

—1  o 

for  the  binomial  case 


0 

0.95 

0.90 

0.90 

0.80 


0.80 


0.70 


0.50 


0.40 


.05 

.02 

.01 

j exact 

1.143 

1.140 

1.141 

approx. 

1.138 

1.138 

1.138 

exact 

1.190 

1.188 

1.192 

approx. 

1.190 

1.190 

1.190 

- 

exact 

< 

1.661 

1.615 

1.630 

approx. 

1.640 

,  1.640 

1.640 

exact 

1.629 

1.646 

1.654 

approx. 

1.667 

1.667 

1.667 

The  final  example  is  the  simple  uniform  case 
f  (x,  6)  =  1(0  -  *2  <  x  <  0  +  %)  ,  g(x,  ip)  =  f  (x,  ip)  (ip  >  0)  . 
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This  is  a  degenerate  case  in  which  distribution  support  changes;  we 
note  that  the  convergence  results  of  Section  2.3  hold.  The  log  like¬ 
lihood  L(§)  is  simply 

{0  (X^ ,  ...  ,  X^  _<  0  +  ;  X^_^ ,  ...  ,  X^,  ^  —  ij) 

\  '  ’  - 

-00  (otherwise) 

if  0  <  r-0  <  1  .  Therefore  we  get  an  interval  estimate  of  £  which 

always  covers  .  The  distribution  of  the  interval  length  is  easily 
seen  to  be  the  convolution  of  two  identical  geometric  distributions, 
and  the  average  interval  length  is  1  +  20JMJ)  ^  . 
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3.  GENERAL  DISCRIMINANT  APPROACH 

The  likelihood  method  of  inference  discussed  in  Section  2  has 
several  possible  faults,  two  of  which  concern  us  here.  First,  when 
0  and  ^  are  unknown  the  log  likelihood  L(£)  can  be  very  inconvenient 
in  actual  data  analysis,  since  it  requires  calculation  of  the  sequence 
of  conditional  estimates  {0^,  .  For  example,  in  the  binomial  case 

where  X  is  0  or  1  with  probabilities  0  and  1-9  or  and  1-^  , 

L<5)  =  £{0?  log  0  +  (l-0?)  logU-0^)}  +  (T-£)  log  ^  +  (1-^  )  log(M  )> 


This  difficulty  remains  even  when  one  parameter  is  known.  Second,  even 
when  0  and  are  known,  the  likelihood  statistics  and  their  distri¬ 
butional  properties  (Section  2.1)  can  be  more  complicated  than  we  might 
wish.  A  specific  case  is  the  generalization  of  the  multivariate  normal 
example  of  Section  2.4  in  which  the  covariance  matrix  changes  as  well 
as  the  mean  vector.  Then,  rather  than  use  the  log  likelihood  with 
quadratic  forms  in  the  ,  one  might  prefer  to  use  a  compound  linear 
discriminant  such  as  (2.30).  This  is  especially  true  in  the  normal 
case,  because  a  linear  discriminant  defines  random  walks  with  normally 
distributed  increments,  so  that  appropriate  inference  statistics  will 
have  known  distributions.  Suitable  choice  of  linear  discriminant  in 
the  normal  case  has  been  studied  by  Anderson  and  Bahadur  (1962). 

These  remarks  suggest  that  we  consider  a  general  class  of  discri¬ 
minants  d(X)  to  discriminate  between  F(x,  0)  and  G(x,  i|i)  ,  so  that 


(3.1) 


DCS) 


-£ 


d  (X  ) 


j=l 
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corresponds  to  L(?)  or  L(£)  .  To  make  any  progress  we  must  have 


(3.2) 


d(x)  dF(x,  0q)  >  0 


d(x)  dG(x,  i pQ)  <  0 


or  vice  versa,  which  implies  that  d(X)  be  a  consistent  discriminant. 
In  practice  this  may  be  difficult  to  guarantee  unless  we  know  one  of 
the  parameters.  However,  a  suitable  d(X)  might  be  suggested  by  an 
initial  look  at  the  data. 

One  other  important  situation  where  D(£)  might  be  useful  is  when 

F(x,  0  )  is  known  but  G  is  unknown.  Then  the  likelihood  cannot  be 
o 

used,  and  d(X)  is  chosen  to  detect  departures  from  F(x,  0q)  .  A 
particular  example  is  departure  from  randomness. 

We  shall  assume  (3.2)  to>hold.  Then  D(£)  has  the  same  random 
walk  structure  as  L(£)  ,  which  we  shall  use  in  Sections  3.1  and  3.2 
to  generate  results  corresponding  to  those  in  Sections  2.1  and  2.2. 

The  use  of  D(£)  in  constructing  an  alternative  to  L(£)  is  discussed 
in  Section  3.3.  Some  examples  are  given  in  Section  3.4. 


3.1  Inference  from  the  discriminant  D(g) 


The  general  discriminant  D(£)  has  properties  that  correspond 
directly  to  those  of  L(£)  ,  described  in  Section  2.1,  provided  (3.2) 
holds.  We  can  therefore  proceed  by  analogy  with  Section  2.1. 
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Under  the  model  (2.1),  D(£)  satisfies 


/ 


5-5, 


(3.3)  D(£)  -  D(£  )  =  { 


El  \  , 

i=l  U’3 


5v 


(50  <  5  <  T) 


d  i  5  <  5o> 


V 


=  -d(X^  +^_j)  •  Thus  D(£)  generates 


where  Y  =  d.(X?  +j)  and  Z^. 

O  \j 

two  random  walks  each  with  iid  increments  of  negative  mean. 

* 

The  discriminant  statistics  for  testing  H: E  =  E  versus  the  one- 
_  *  +  * 

sided  alternatives  H  :  £  <  £  and  H  :  E  >  £  ate  respectively 


(3.4)  S  =  max  D(£)  -  D(£  )  ,  si!"  =  max  D(£)  -  D(£  )  . 

5<£*  Z>Z* 


For  the  two-sided  alternative  the  test  statistic  is  =  max(SD  ,  S^)  . 
When  H  is  true,  so  that  £  =  5  ,  we  have  from  (3.3)  and  (3.4) 


o 

=  max  S  Z. 


n 


n>l  j 
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(3.5) 


n 


SD  =  VD  =  max 


n>l  j=l 


D,j 


and 


SD  -  *a*(UD,  VD)  . 


The  discriminant  estimate  £  ,  which  maximizes  D(£)  ,  satisfies  the 

/  a 

equation  corresponding  to  (2.7)  with  subscript  D  everywhere.  Both 

A 

and  ~  5  are  0  (1)  .  As  in  Section  2.1  we  assume  E  and 
u  u  o  p  o 

T-£^  to  be  infinitely  large,  so  that  UQ  and  VD  are  maxima  of  inter- 


minate  random  walks. 
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Let  the  distribution  functions  of  Y^  and  Z D  be  (y)  and 


D 


(z)  respectively,  and  denote  the  distribution  functions  of 


D 


and  V  by  Pu  (u)  and  Pv  (v)  .  Then  by  (3.2),  the  equations 
U  UD  D 

corresponding  to  (2*8)  and  (2.9)  are  immediate.  The  asymptotic  form 

of  P  (v)  is 
D 


(3.6) 


P  (v)  ~  1  -  c  exp (HO  v) 
D  yD  D 


(v  -*■  °°) 


where  to  is  the  positive  solution  to 
YD 


(3.7) 


E{exp(0)Y  Yd)>  =  1  . 


To  proceed  any  further  we  must  obviously  assume  such  solutions  exist. 

Assumption  3.1.  The  distribution  functions  F(x,  0q)  ,  G(x,  i|0 
and  the  discriminant  d (X)  are  such  that  (3.2)  holds  and  (3.7)  and 
(3.10)  have  positive  finite  solutions. 

*|* 

If  we  denote  the  (asymptotic)  test  size  for  by  (v;  D)  , 

then  by  (3.6)  we  have 


(3.8) 


$  (v;  D)  =  1-P  (v)  ~  c  exp  (-0)  v) 

D  XD 


This  provides  an  approximation  for  small  test  size,  as  does  (2.11)  for 
$+  (v)  =  (v;  L)  .  The  corresponding  result  for  the  asymptotic  test 


size  of  Sp  is 


(3.9)  3  (u;  D)  =  1-P  (u)  ~  c_  exp (ho  u) 

°  UD  ZD  ZD 

where  to  is  the  positive  finite  solution  to 


(3.10) 


E{exp  (to  Z  )}  -  1 
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Solutions  for  the  constants  c  and  c  are  determined  by  solving 

YD  zd 

the  integral  equations  for  P  (u)  and  P  (v)  as  described  in 

jD  VD 

Section  2.1.  Note  that  (3.7)  and  (3.10)  may  not  have  explicit  solutions, 

although  in  most  common  cases  numerical  solution  for  to  and  to„  is 

YD  ZD 

quite  easy. 

The  remarks  in  Section  2.1  about  interval  estimation  apply  here 

also. 

A  A  A 

For  the  discriminant  estimator  ,  as  with  K  =  >  the  asymptotic 

distribution  is  complicated.  The  special  case  of  linear  discriminants 
for  univariate  normal  random  variables  with  mean-shift  is  discussed  by 
Hinkley  (1971).  Here  we  note  that  the  probability  of  no  misclassif ica- 
tion  is 


(3.11) 


Pr(?D  K  )  Py  (0)  Pv  (0) 
D  D 


which  is  determined  by  equations  corresponding  to  (2.9). 
If  0  and  ip  are  unknown,  they  can  be  estimated  by 


(3.12) 


0D  =  dE  and  = 

9D 


with  0^  and  as  defined  in  Section  2.3.  These  estimates  correspond 

A  A 

to  0  and  ,  and  have  the  same  asymptotic  properties;  see  Theorems  3.1 

q. 

and  3.2  in  Section  3.3.  Hence  the  percentage  points  of  S^  ,  for  example, 
can  be  consistently  estimated,  just  as  those  of  S+  can. 
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3.2  Power  and  efficiency  of  discriminant  statistics 
We  introduced  the  discriminant  D(£)  because  it  may  be  easier  to 
use  than  the  likelihood  in  some  cases.  However,,  simplicity  may  be 
gained  at  the  expense  of  power.  Therefore  we  need  to  examine  the  power 
of  statistics  such  as  and  compare  with  the  equivalent  likelihood 

statistics.  Here  we  look  at  approximations  of  the  type  discussed  in 
Section  2.2. 

Following  the  arguments  leading  up  to  (2.18)  in  Section  2.2,  we 

+  * 

find  the  (asymptotic)  power  of  S^  at  £  =  ^0“r  is 

(3.13)  8^r  (v;  D)  ~  8^  (v;  D)  [E  exp(-wy  Z D)  ]r 

as  v  00  or  |  j  F— G  [  |  0  with  r  >  0  .  We  shall  assume  the  right- 

hand  side  of  (3.13)  to  exist,  although  this  need  not  always  be  the  case. 

The  corresponding  expression  for  power  of  S„  at  £  =  £  +r  is 

1)  o 

(3.14)  8  (v;  D)  ~  8  (v;  D)  [E  exp(-O)  Y  )  ]r 

r  o 

for  r  >  0  . 

In  particular  cases  one  can  compare  (3.13)  with  (2.18)  to  see  how 
well  D(£)  performs  relative  to  L(£)  ,  for  large  v  or  small  | |f-g| |  . 
Aside  from  this,  there  are  several  ways  to  measure  relative  efficiency 

-f* 

of  SQ  ,  say.  One  measure  that  is  particularly  relevant  to  the  model 
(2.2)  is  the  analogue  of  Pitman  efficiency  expressed  in  terms  of 
observation-spacing. 


26 


ui ■  5> 


a  constant 


'I 


J 


) 


Assume  that  in  (2.2)  the  spacing  u^+^  ” 


* 


depending  on  D(*)  .  Now  suppose  we  wish  to  test  H:  r)  =  rj  against 
H  :  n  >  r|  with  test  size  a  and  fixed  power  .IT  at  r)  =  T)  +  y  . 

Then  as  a  -*■  0  or  |  | F— G |  |  -*■  0  ,  we  see  from  (3.13)  that  -*■  0  also. 
The  magnitude  of  depends  on  D(*)  ,  and  from  (2.18)  and  (3.13)  we 

get 


(3.15) 


log  E{exp(-w  Z  )} 
V  +  ,  .  D  XD 

L  =  r~-  =  - 

e  L^-0  L  log  E{exp(-Z)} 


independently  of  a  and  it  .  In  view  of  the  examples  in  Section  2.4 
we  might  expect  that  £  *  approximates  £p/€L  well  for  useful  values 

of  a  and  | |F-G| |  .  Note  that  if  £p/£L  is  less  than  one,  then  the 

•f  -f 

power  function  of  is  totally  dominated  by  that  of  S 

The  corresponding  efficiency  for  ,  with  fixed  power  at 

* 

n  =  n  "Y  ,  is 

log  E{exp(-o)  Y  )} 

S  D 

log  E{exp(-Y)} 


For  the  two-sided  test  statistic  SQ  =  max(SD,  Sp)  the  situation  is  a 
little  more  complicated.  Without  going  into  details,  it  can  be  shown 
that  if  Sn  and  S  =  ST  have  equal  powers  at  n  =  r|  +  y  and 
T1  =  Tlo  “  ^2  »  wit^1  lower  bounds  and  tt ^  respectively,  then  the 

required  spacings  and  ^  satisfy 

lim  ~  ^  =  min(^,  £*)  . 

a,-+Q  L 


i 
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One  special  class  of  interesting  problems  is  that  of  location  shift 
with  F(x,  0)  =  F(x  -  0)  and  G(xj  if>)  =  F(x  -  ip)  .  Here  a  natural  class 
of  discriminators  D(£)  is  the  linear  class  with 

d(X)  =  V  (X  -  a) 


For  simplicity  we  shall  only  consider  the  .univariate  case,  and  without 
loss  of  generality  we  assume  0  >  ip  so  that  d(X)  =  X-a  .  By  (3.2), 
then,  we  must  have  0  >  a  >  ip  .  The  random  walk  increments  YQ  and 
Zp  defined  in  (3.3)  have  distribution  functions  F(y  -  ip  +  a)  and 
F(-z  -0  +a)  respectively.  If  we  now  define  p(A)  to  be  the  positive 
solution  of 


(3.16) 


exp(wx)  dF  (x)  =  exp(toA) 


exp(-oix)  dF (x)  *  exp(-t)A) 


(X  >  0) 


(X  <  0)  , 


then  we  find  from  (3.7)  and  (3.10)  that 


coY  =  p  (a-ip)  ,  wz  =  p  (a-0 )  . 


It  follows  that 


(3.17) 


E{exp(-wY  ZD)}  =  exp{  (d-ty)  p(a-ip)} 
E{exp(-o)z  Yjj)}  =  exp{  (0-tf»)  p(a-0)}  .. 


Substitution  from  (3.17)  into  (3.15)  shows  the  spacing  efficiency  ^ 

•f 

for  Sp  ,  for  example,  to  be 
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If  a  ^  (0-H{O/2  >  it  is  possible  for  (3.18)  to  be  greater  than  one. 
An  example  is  given  in  Section  3.4.  When  a  =  (0-H|O/2  we  have  the 
normal  theory,  or  least-squares,  discriminant  d(X)  =  X  -  (0+t|>)/2  .  A 
little  calculation  shows  that  is  the  efficiency  of  least-squares 

estimation  of  0  ,  as  we  would  expect;  cf.  (2.19). 

We  should  stress  that  these  results  are  limiting  results,  and  rely 
on  the  existence  of  quantities  such  as  w  .  Some  specific  examples 
are  given  in  Section  3.4  to  illustrate  the  usefulness,  and  limitations, 
of  the  results  as  practical  approximations. 

Rather  than  look  at  properties  of  test  statistics  based  on  D(£)  , 
we  might  be  more  interested  in  how  well  the  data  can  be  classified. 

This  might  involve  the  comparison  of  (3.11)  and  (2.14),  the  probabilitie 
of  correct  classification.  We  have  not  derived  any  suitable  limiting 
results  for  such  comparisons,  but  for  the  location-shift  problem  one 
would  expect  the  best  linear  discriminant  to  be  that  which  best  discri¬ 
minates  in  the  classical  single-observation  discriminant  problem.  An 
example  is  given  in  Section  3.4. 


3.3  An  asymptotically  efficient  two-stage  procedure 


Suppose  that  at  least  one  of  0  and  is  unknown,  and  that  we 
have  used  D(£)  in  preference  to  L(£)  in  order  to  estimate  and  make 

A  A 

inference  about-  £  .  Then  our  estimate  defines  the  estimates  0^ 


and  ,  which  we  may  have  used  to  estimate  inference  properties. 

A  A 

Another  use  for  0^  and  is  in  construction  of  the  pseudo-likelihood 


'V/ 

v*> 


■  £  a  T 

=  YZ  log  f  Oty  eD)  +  £1  log  g(xj5  ij>D) 

^  /\  /s. 

=  sample  constant  +  }  log{f(X^,  0D)/g(X^,  i^)}  • 


j= 

In  a  situation  where  L(£)  is  not  difficult  to  work  with,  use  of  D(£) 
and  then  might  be  preferable  to  the  use  of  L(£)  .  Of  course, 

Ak  A  ^ 

if  0D  and  indicate  that  D(£)  is  very  efficient,  then  1^(5) 

would  not  be  needed. 

A 

Let  the  pseudo-likelihood  statistics  corresponding  to  £  and  S 
be  denoted  by  and  SD  .  Then  and  SD  are  asymptotically 

A 

efficient  (relative  to  £  and  S)  by  the  following  theorem. 


Theorem  3.1.  If  Assumption  1  and  (3.2)  hold,  then  as  T  -*  °°  ,  (i)  0^  -  0q 
and  ipD  -  ipo  are  op(l)  ,  (ii)  -  £  =  °p(l)  and  (iii)  SD  -  S  =  °p(l) 
if  £  - \o  =  0(1)  . 


These  results  may  be  proved  in  much  the  same  way  as  Theorem  2.1,  using 


the  fact  that  ^  =  0  (1)  .  We  omit  the  details  here.  A  stronger 

A  A 

result  for  0^  and  ^  ,  corresponding  to  Theorem  2.2,  is 


Theorem  3.2.  If  the  conditions  of  Theorem  3.1  hold,  then  (0^  -  0  ) 

-  — - — —  —  .  o  1)  o 

A 

and  /T-£q  (i^jj  -  if)Q)  have  the  same  limiting  normal  distributions  as 
(0~  r-  0  )  and  /T-£  -  ip  )  .  The  proof  of  Theorem  2.2  can 

O  c  O  O  c  O 

O  A.  O  _ 

be  applied  with  replacing  E,  .  Note  that  Theorem  3.2  holds  if  £ 

is  replaced  by  gjj  or  since  the  relative  errors  in  these  estimates 

are  o(l)  . 
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Some  empirical  results  on  the  use  of  L^(£)  in  finite  samples  from 
univariate  normal  distributions  are  described  by  Hinkley  (1971,  Section 
3.2). 

A  more  extreme  situation  where  the  pseudo-likelihood  might  be  of 

/\ 

use  is  the  case  of  unknown  F  and  (or)  G  .  Given  the  estimate 
from  D(£)  ,  it  is  possible  to  obtain  a  consistent  estimate  of 
£(X^)  =  log{f(X^,  9o)/g(Xj,  ij^)}  for  smooth  densities  f  and  g  ;  for 
example,  estimates  can  be  based  on  Parzen  density  estimates  (Parzen, 

A 

1962).  If  the  consistent  estimate  for  £(X^)  were  ^(X^)  ,  then  the 
pseudo-log  likelihood 

v»  -  it  yv 

would  generate  statistics  asymptotically  equivalent  to  £  and  S  . 

In  practice  one  might  restrict  F  and  G  to  some  suitable  finite 
class  among  which  discrimination  is  possible.  For  example,  the  class 
might  contain  the  normal,  gamma,  log  normal  and  Cauchy  distributions. 

The  appropriate  generalization  of  Theorem  3.1  could  presumably  be  proved 
without  much  additional  effort.  One  useful  area  of  application  might  be 
in  models  for  departure  from  randomness,  where  the  alternative  to 
randomness  is  not  specified  exactly.  Then  G(x,  \p  )  would  be  estimated 

A 

or  selected  conditional  on  . 
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3.4  Some  examples 


The  first  example  is  the  Laplace  mean-shift  case  where 

f(x,  0)  =  ^  exp  (- 1  x-0 1 )  ,  g(x,  ij>)  =  f(x,  0-2A)  . 

The  likelihood  random  walk  increments  Y  and  Z  defined  at  (2.4) 
have  the  distribution  function 


(3.19) 


Hy(y)  K 


0 

H 

1  -  h  exp(-hy-A) 
l 

\ 


(y  <  -2A) 

(y  =  -2A) 

(-2A  <  y  <  2A) 
(2A  <  y) 


and  hence 


(3.20)  E{exp(-Y)}  =  E{exp(-Z)}  =  -j  exp(2A)  +  -j  exp(-4A)  . 


Now  suppose  that  we  use  the  linear  discriminant  with 


d(X)  =  X  -  0  -  A  . 


Then  from  Feller  (1966,  Chapter  12)  we  have  the  exact  result  that 

(3.21)  3*(v;  =  =  ^1-wy  )  exp(-to  v)  (v  >  0) 

D  D 

and  the  equation  (3.7)  becomes 


l- to  =  exp(-wA)  . 


If  we  write  <ov'  =  <o„  =  to(A)  ,  then  we  get 


both 
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(3.22)  log  E{exp(-w  Z^)}  =  2Aa)(A)  . 

D 

Table  3.1  gives  some  values  of  the  spacing  efficiency  £D(*=  (•  =  j£*) 
calculated  from  (3.15),  (3.20)  and  (3.22).  These  values  are  quite  low, 
but  the  simple  exact  distribution  (3.20)  contrasts  sharply  with  the 
corresponding  likelihood  distributions,  which  involve  solution  of  (2.8) 
with  Hy(y)  given  by  (3.19). 

Table  3.1.  Spacing  efficiency  for  d(X)  =  X-0-A  in  the 

Laplace  case  with  mean-shift  2A  . 


A  0+ 


0.1  0.2 


0.3  0.4 


0.5 


0.50 


0.54 


0.58  0.62 


0.67  0.73 


As  a  second  example  consider  the  univariate  normal  case  with  mean 
and  variance  changing.  That  is,  let 


f(x,  9)  =  <J>(X)  and  g(x,  ijj)  =  \<p{\(X+2A)}  . 

The  likelihood  increments  Y  and  Z  are  quadratic  in  X  ,  and  simple 
calculation  shows  that 

(3.23)  E(exp(-Z)}  =  ( 2X 2  -  X4)”^  exp{4X2  A2/(2-X2)}  , 


which  is  only  valid  for  X  <  ■/!  .  Thus  the  approximation  (2.18)  only 
exists  for  X  <  i/2  .  For  the  linear  discriminant  d(X)  =  X  -  A  ,  a 
simple  calculation  via  (3.7)  gives 

(3.24)  E{exp(-wy  ZD)}  =  exp(2A2  X2(l+X2)}  . 
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+ 

When  A  =  1  ,  the  spacing  efficiency  given  by  (3.15),  (3,23)  and 

(3.24)  has  values  0.95  and  0.95  for  X  =0.9  and  A  =1.1  .  Of 
course  d(X)  =  X  -  A  is  the  likelihood  discriminant  when  A  =  1  ,  and 
so  should  have  high  efficiency  for  A  close  to  1  . 

In  the  normal  case  with  constant  variance  (A  =  1  in  the  previous 
example),  the  likelihood  L(§)  uses  the  linear  discriminant  d(X)  =  X  -  A  ; 
see  Section  2.4.  Consider  the  generalization  d(X)  =  X  -  6  ,  for  which 
Yp  and  in  (3.3)  are  N(-<5,  1)  and  N(-2A-H$,  1)  respectively. 

Then  we  get 

E(exp(-W  Z  )}  =  exp{4A(2A-6)>  (0  <  6  <  2A) 

*D 

Hence  for  0  <  6  <  A  ,  S*  is  more  powerful  than  the  likelihood  statistic 
*4*  —  — * 

S  ,  while  SD  is  less  powerful  than  S  ;  the  reverse  is  true  for 
A  <  6  <  2A  .  For  two-sided  tests,  however,  the  likelihood  statistic 
appears  preferable:  some  numerical  results  are  given  by  Hinkley  (1971). 
Notice  that  ! 


The  case  6  =  A  also  gives  the  lowest  misclassif ication  probability  for 
this  problem. 
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4 .  GENERAL  REMARKS 


One  feature  of  the  change-point  problem  is  the  complicated  nature 
of  the  distribution  theory  for  inferential  statistics.  The  approxima¬ 
tions  derived  in  this  paper  for  likelihood  ratio. may  simplify  data 
analysis,  and  the  approximations  discussed  in  Section  3.3  will  give  some 
idea  as  to  efficiency  loss. 

The  two-stage  procedure  mentioned  in  Section  3.4  provides  the 
opportunity  to  clean  up  an  initial  identification  of  the  change-point, 
as  it  were.  Its  value  lies  in  circumventing  calculation  of  a  complicated 
likelihood  function  with  all  the  conditional  estimates  of  parameters.  In 
practice  initial  estimates  of  0  and  ijj  might  be  made  by  omitting  the 
data  near  the  change-point,  if  such  a  judgment  is  possible. 

One  could  regard  the  linear  discriminant  as  a  non-distributional 
equivalent  of  the  likelihood  for  location-shift  problems.  The  results 
of  Section  3  cover  certain  aspects  of  the  discriminant,  but  not  the 
important  one  of  robustness.  We  have  not  investigated  this  very  much, 
but  calculations  are  easy  for  one  special  case.  Suppose  we  use 
d(X)  =  X  -  0  -  A  3  assuming  X  to  be  normal  with  mean-shift  2A  and 
variance  1  ,  and  carry  cut  a  two-sided  test  of  H^:  £  =  with  assumed 

size  0.050  ,  If  in  fact  X  has  a  Laplace  distribution  with  mean-shift 
2A  and  variance  1  ,  the  true  rejection  probability  is  0.057  ,  For 
assumed  size  0.010  ,  the  true  size  is  0.015  .  (The  true  distribution 
of  Sp  in  this  case  follows  from  calculations  for  the  Laplace  example 
in  Section  3.4.)  These  calculations  indicate  reasonable  robustness 
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against  long-tailed  symmetric  distributions.  The  effect  of  asymmetry 
and  dependence  between  observations  are  probably  most  conveniently 
determined  by  Monte  Carlo  studies,  which  we  have  not  undertaken. 

Models  with  more  than  one  change-point  are  considerably  more  difficult 
to  analyze,  although  the  formal  theory  of  likelihood  inference  generalizes 
quite  easily.  If  the  number  of  change-points  is  known,  and  large  numbers 
of  observations  are  taken  between  each  one,  one  might  be  able  to  break  the 
data  into  segments  each  with  one  change.  For  multiple  location-shifts, 
the  linear  discriminant  is  appropriate  because  it  can  be  used  as  a 
sequential  detector  (Page,  1954;  Hinkley,  1971). 

When  the  number  of  change-points  is  not  known,  sequential  analysis 
with  the  discriminant  is  still  appropriate  but  not  necessarily  good. 

Sclove,  in  an  unpublished  Stanford  University  Technical  Report,  has 
considered  the  use  of  finite  moving  averages.  This  type  of  analysis 
was  indicated  by  Chernoff  and  Zacks  (1964),  who  introduced  prior  distri¬ 
butions  on  the  change-points.  One  relevant  class  of  models  to  consider 
is  that  where  p  _>  2  populations  can  generate  observations  and  the 
sequence  of  populations  forms  a  Markov  chain,  possibly  with  diagonal 
elements  of  the  transition  probability  matrix  close  to  unity. 
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